Climate change#
In the following sections, we use Python to demonstrate how to access multiples datasets from the Atmosphere sub-catalog.
Environment setup#
[1]:
from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import dask
import xoak
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
import dask
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import fsspec
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
/tmp/ipykernel_3240/326864322.py:16: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from tqdm.autonotebook import tqdm
We use a Dask client to ensure all following code compatible with the framework run in parallel
[2]:
client = Client()
client
[2]:
Client
Client-c21979c0-6a5d-11ed-8ca8-000d3aecfde1
| Connection method: Cluster object | Cluster type: distributed.LocalCluster |
| Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
3691c4ea
| Dashboard: http://127.0.0.1:8787/status | Workers: 2 |
| Total threads: 2 | Total memory: 6.78 GiB |
| Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-33b7fd71-38f4-47dd-93ba-1c903698611e
| Comm: tcp://127.0.0.1:41255 | Workers: 2 |
| Dashboard: http://127.0.0.1:8787/status | Total threads: 2 |
| Started: Just now | Total memory: 6.78 GiB |
Workers
Worker: 0
| Comm: tcp://127.0.0.1:35009 | Total threads: 1 |
| Dashboard: http://127.0.0.1:43439/status | Memory: 3.39 GiB |
| Nanny: tcp://127.0.0.1:46693 | |
| Local directory: /tmp/dask-worker-space/worker-loznmw9h | |
Worker: 1
| Comm: tcp://127.0.0.1:39961 | Total threads: 1 |
| Dashboard: http://127.0.0.1:39459/status | Memory: 3.39 GiB |
| Nanny: tcp://127.0.0.1:36299 | |
| Local directory: /tmp/dask-worker-space/worker-c3s03jf5 | |
Accessing the data#
We are now ready to access our catalog which uses Intake to organize all our datasets.
Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.
Here is the URL from where we can open the catalog:
a) CMIP6#
In order to arrange the collection of datasets, the catalogue itself makes references to various sub-catalogs:
[3]:
col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json')
col
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
| unique | |
|---|---|
| activity_id | 18 |
| institution_id | 36 |
| source_id | 88 |
| experiment_id | 170 |
| member_id | 657 |
| table_id | 37 |
| variable_id | 700 |
| grid_label | 10 |
| zstore | 514818 |
| dcpp_init_year | 60 |
| version | 736 |
| derived_variable_id | 0 |
[4]:
col.df['variable_id'].unique()
[4]:
array(['ps', 'rsds', 'rlus', 'rlds', 'psl', 'hurs', 'huss', 'hus', 'hfss',
'rsus', 'evspsbl', 'rsdt', 'hfls', 'rsut', 'clt', 'zg', 'ts', 'va',
'uas', 'vas', 'tas', 'ta', 'ua', 'pr', 'tauv', 'prc', 'tauu',
'rsutcs', 'wtem', 'vtem', 'prw', 'prsn', 'rlut', 'rlutcs',
'tasmax', 'tasmin', 'emidust', 'emiss', 'mmrbc', 'mmrdust',
'mmroa', 'mmrpm2p5', 'o3', 'mmrsoa', 'mmrss', 'od550lt1aer', 'oh',
'emidms', 'mmrso4', 'cltc', 'ptp', 'airmass', 'ccb', 'cdnc', 'toz',
'so2', 'rsutcsaf', 'wa', 'rlutcsaf', 'rlutaf', 'od870aer',
'od550aer', 'abs550aer', 'rsutaf', 'snw', 'mrsos', 'mrso', 'mrro',
'areacella', 'siconc', 'basin', 'mrros', 'mlotst', 'clivi', 'hur',
'sfdsi', 'co2mass', 'rsntds', 'masso', 'soga', 'evspsblsoi', 'sos',
'sosga', 'tauuo', 'sfcWind', 'clwvi', 'vo', 'vmo', 'uo', 'umo',
'tosga', 'tauvo', 'tos', 'thetao', 'thetaoga', 'pbo', 'thkcello',
'orog', 'volo', 'wfo', 'cllcalipso', 'evspsblpot', 'wap', 'mrsol',
'so', 'wmo', 'rld', 'evspsblveg', 'lai', 'prveg', 'zos',
'bigthetaoga', 'deptho', 'hfgeou', 'masscello', 'bigthetao', 'siv',
'tran', 'wo', 'sisnmass', 'simass', 'siarean', 'siconca',
'siextentn', 'sisnconc', 'sispeed', 'sisnthick', 'zostoga',
'sitemptop', 'sithick', 'sitimefrac', 'siu', 'areacello', 'sivol',
'siareas', 'sivols', 'gpp', 'hursmax', 'hursmin', 'sndmasssnf',
'sivoln', 'sfcWindmax', 'rootd', 'sftgif', 'hfbasin', 'hfds',
'areacellr', 'mrsofc', 'wap500', 'albisccp', 'ta700', 'utendepfd',
'rldscs', 'cltisccp', 'rsuscs', 'clcalipso', 'rsdscs', 'rtmt',
'sci', 'clhcalipso', 'sftlf', 'zg1000', 'rv850', 'utendnogw',
'utendvtem', 'tslsi', 'tnhusc', 'dryoa', 'c3h8', 'loadbc',
'cldnvi', 'tntr', 'tntmp', 'tntc', 'c2h6', 'tnta', 'zg500', 'tnt',
'tnhusa', 'rsdsdiff', 'rsdscsdiff', 'loadsoa', 'loadpoa',
'clmcalipso', 'clisccp', 'loadss', 'tnhusmp', 'tnhus', 'loaddust',
'loadoa', 'loadso4', 'psitem', 'epfz', 'epfy', 'cltcalipso',
'cfc12', 'sf6', 'cfc11', 'epc100', 'dpo2', 'sios',
'treeFracNdlEvg', 'treeFracBdlEvg', 'treeFracBdlDcd', 'utendogw',
'utendwtem', 'tntnogw', 'tntogw', 'tntrl', 'tntrlcs', 'grassFrac',
'vegFrac', 'vtendogw', 'vtendnogw', 'mrfso', 'zmeso', 'volcello',
'si', 'talk', 'spco2', 'talkos', 'cropFrac', 'zmesoos', 'pon',
'snm', 'snd', 'snc', 'hfdsn', 'phymisc', 'pp', 'phyfeos',
'phydiaz', 'phydiatos', 'phydiat', 'phycos', 'phyc', 'phos',
'phydiazos', 'dissicos', 'limfepico', 'limfemisc', 'limfediaz',
'limfediat', 'intpppico', 'epp100', 'epn100', 'epfe100',
'chlmiscos', 'chldiatos', 'chl', 'calcos', 'bsios', 'bfeos',
'baccos', 'aragos', 'tsl', 'chlpicoos', 'epcalc100', 'chlos',
'chldiazos', 'limirrdiat', 'nbp', 'pastureFrac', 'ra', 'rh',
'treeFrac', 'npp', 'limirrdiaz', 'limirrmisc', 'limirrpico',
'intdic', 'o2sat', 'intdoc', 'intpbfe', 'intpbn', 'intpbp',
'intpcalcite', 'intpn2', 'intpoc', 'intpp', 'intparag',
'eparag100', 'intpbsi', 'graz', 'fgco2', 'limndiat', 'limnmisc',
'limnpico', 'nh4', 'nh4os', 'no3', 'fgo2', 'no3os', 'intppnitrate',
'intppmisc', 'intppdiaz', 'intppdiat', 'epsi100', 'expc', 'o2',
'dissocos', 'expcalc', 'expfe', 'ph', 'pcalc', 'pbsi', 'pbfe',
'parag', 'fescav', 'fediss', 'expsi', 'phynos', 'phypico',
'phypicoos', 'phypos', 'physios', 'po4', 'ponos', 'zmicro',
'zmicroos', 'zooc', 'expn', 'expp', 'po4os', 'exparag', 'zoocos',
'arag', 'o2satos', 'o2os', 'co3', 'co3os', 'co3satarag',
'co3sataragos', 'co3satcalc', 'detocos', 'dfe', 'dfeos', 'dissic',
'dissoc', 'co3satcalcos', 'dpco2', 'chlmisc', 'bacc', 'bfe', 'bsi',
'calc', 'chldiat', 'chlpico', 'detoc', 'chldiaz', 'phyfe', 'phyn',
'phyp', 'cct', 'cfc113global', 'cfc11global', 'cfc12global',
'ch4global', 'ci', 'cli', 'clw', 'cl', 'physi', 'pnitrate', 'pop',
'ppdiat', 'ppmisc', 'pppico', 'ppdiaz', 'n2oglobal', 'phymiscos',
'hus850', 'ta500', 'co23D', 'co2s', 'cProduct', 'cVeg',
'hcfc22global', 'cLand', 'concdust', 'sconcss', 'sconcso4',
'sconcdust', 'rhLut', 'raLut', 'nppLut', 'nep', 'vegHeight',
'gppLut', 'laiLut', 'tntrs', 'tatp', 'emico', 'emiisop', 'emilnox',
'eminh3', 'emibc', 'lossn2o', 'emibvoc', 'lossco', 'lossch4',
'jno2', 'isop', 'hcl', 'hcho', 'emiso2', 'emioa', 'eminox', 'hno3',
'rsd', 'rsdcs', 'n2o', 'dryso4', 'rsu', 'rsucs', 'ch3coch3', 'ch4',
'cheaqpso4', 'nh50', 'no', 'mmrpm10', 'mmrno3', 'mmrnh4',
'od550bc', 'od550dust', 'mmrpm1', 'no2', 'o3loss', 'od440aer',
'o3ste', 'o3prod', 'dryso2', 'dryo3', 'drynoy', 'drynh4', 'drynh3',
'dryss', 'drydust', 'dms', 'co2', 'co', 'chepsoa', 'chegpso4',
'drybc', 'lwp', 'reffclwtop', 'photo1d', 'pan', 'od550ss',
'od550soa', 'od550so4', 'od550oa', 'od550no3', 'tntrscs', 'c3h6',
'bldep', 'aoanh', 'tntscp', 'cSoil', 'msftyz', 'cLitter', 'fsfe',
'dcalc', 'bddtdip', 'bddtdife', 'bddtdic', 'bddtalk', 'remoc',
'fFireNat', 'fLuc', 'fProductDecomp', 'fracLut', 'hflsLut',
'hfssLut', 'hussLut', 'fAnthDisturb', 'ec550aer', 'mrlso', 'mrsfl',
'mc', 'wetdust', 'wetnh3', 'wetnh4', 'wetnoy', 'wetoa', 'wetso2',
'wetso4', 'wetss', 'ztp', 'wetbc', 'tropoz', 'meanage', 'mrsll',
'tasLut', 'mrroLut', 'tslsiLut', 'mrsoLut', 'tnhuspbl', 'tnhusscp',
'tntpbl', 'mrsosLut', 'chepasoa', 'netAtmosLandCO2Flux', 'rls',
'rlusLut', 'rss', 'rsusLut', 'mfo', 'edt', 'evu', 'emiaoa',
'emiso4', 'somint', 'rsdoabsorb', 'osalttend', 'osaltrmadvect',
'hfcorr', 'osaltpsmadvect', 'osaltpmdiff', 'opottemppmdiff',
'opottemppadvect', 'opottempmint', 'opottempdiff',
'opottemppsmadvect', 'opottemprmadvect', 'opottemptend',
'osaltpadvect', 'osaltdiff', 'prthetao', 'pathetao', 'wfcorr',
'tob', 'zossq', 'tossq', 'sossq', 'omldamax', 'friver', 'evs',
'sftof', 'mlotstmax', 'msftyrho', 'pso', 'msftyzsmpa', 'mlotstmin',
'sob', 'parasolRefl', 'clwmodis', 'cltmodis', 'clmisr', 'climodis',
'clcalipsoice', 'clcalipsoliq', 'sfpm25', 'sfo3', 'sfno2',
'cVegLut', 'cSoilLut', 'rsdo', 'pfull', 'phalf', 'icfriver',
'fHarvest', 'fFire', 'fGrazing', 'mlotstsq', 'tnpeo', 'zo2min',
'difvmto', 'tnkebto', 'dispkevfo', 'difvtrto', 'difvso', 'difvmo',
'difvho', 'diftrxylo', 'dispkexyfo', 'diftrelo', 'sftgrf',
'diftrblo', 'fsn', 'fsitherm', 'agessc', 'prra', 'o2min',
'msftbarot', 'froc', 'fric', 'ficeberg', 'fbddtdip', 'fbddtdin',
'fbddtdife', 'fbddtdic', 'fbddtalk', 'fbddtdisi', 'rlu',
'baresoilFrac', 'clayfrac', 'ksat', 'sandfrac', 'slthick', 'tdps',
'prhmax', 'hcont300', 'ocontempdiff', 'ocontemppadvect',
'ocontemppmdiff', 'ocontemprmadvect', 'ocontemptend', 'cLitterLut',
'cSoilSlow', 'cProductLut', 'cSoilFast', 'residualFrac',
'cSoilMedium', 'cRoot', 'cLeaf', 'cLitterAbove', 'cLitterBelow',
'hfy', 'obvfsq', 'ta850', 'cod', 'ttop', 'msftyzmpa', 'vsfsit',
'wilt', 'siltfrac', 'fldcapacity', 'simpmass', 'simpconc', 'dmso',
'ocfriver', 'hfx', 'burntFractionAll', 'zfull', 'shrubFrac',
'hfgeoubed', 'lithk', 'topg', 'sbl', 'msftmzmpa', 'cCwd',
'tasminCrop', 'tasmaxCrop', 'msftmzsmpa', 'zsatarag', 'spco2abio',
'bddtdin', 'bddtdisi', 'fddtalk', 'dpco2nat', 'vsf', 'phabioos',
'phnatos', 'ppos', 'rlntds', 'fddtdisi', 'fgco2abio', 'fgco2nat',
'frfe', 'frn', 'dissicnatos', 'fddtdic', 'dpco2abio', 'fddtdife',
'fddtdin', 'fddtdip', 'dissicnat', 'dissicabio', 'msftmz', 'tsn',
'talknat', 'dissi14cabio', 'fg14co2abio', 'spco2nat', 'co3nat',
'phnat', 'fco2nat', 'fracInLut', 'fracOutLut', 'chlcalcos',
'chlcalc', 'phycalc', 'msftmrho', 'sidmassmeltbot', 'sidmasssi',
'sifllwutop', 'siforcecorioly', 'sidmassth', 'siflcondtop',
'sidmassdyn', 'siforcecoriolx', 'siforceintstry',
'ocontemppsmadvect', 'difmxylo', 'msftyrhompa', 'msftmrhompa',
'phabio', 'co3abio', 'difmxybo'], dtype=object)
[5]:
col.df
[5]:
| activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | HighResMIP | CMCC | CMCC-CM2-HR4 | highresSST-present | r1i1p1f1 | Amon | ps | gn | gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | NaN | 20170706 |
| 1 | HighResMIP | CMCC | CMCC-CM2-HR4 | highresSST-present | r1i1p1f1 | Amon | rsds | gn | gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | NaN | 20170706 |
| 2 | HighResMIP | CMCC | CMCC-CM2-HR4 | highresSST-present | r1i1p1f1 | Amon | rlus | gn | gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | NaN | 20170706 |
| 3 | HighResMIP | CMCC | CMCC-CM2-HR4 | highresSST-present | r1i1p1f1 | Amon | rlds | gn | gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | NaN | 20170706 |
| 4 | HighResMIP | CMCC | CMCC-CM2-HR4 | highresSST-present | r1i1p1f1 | Amon | psl | gn | gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | NaN | 20170706 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 514813 | CMIP | EC-Earth-Consortium | EC-Earth3-Veg | historical | r1i1p1f1 | Amon | tas | gr | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20211207 |
| 514814 | CMIP | EC-Earth-Consortium | EC-Earth3-Veg | historical | r1i1p1f1 | Amon | tauu | gr | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20211207 |
| 514815 | CMIP | EC-Earth-Consortium | EC-Earth3-Veg | historical | r1i1p1f1 | Amon | hur | gr | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20211207 |
| 514816 | CMIP | EC-Earth-Consortium | EC-Earth3-Veg | historical | r1i1p1f1 | Amon | hus | gr | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20211207 |
| 514817 | CMIP | EC-Earth-Consortium | EC-Earth3-Veg | historical | r1i1p1f1 | Amon | tauv | gr | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20211207 |
514818 rows × 11 columns
The following examples will focus on datasets from the Atmosphere sub-catalog.
Even though our catalogue is constantly expanding, some datasets are already available. The next sections contain several examples of queries as well as analyses of various ones.
The current Atmosphere catalog is presented below in a table format. A dataset should be used after consulting the status field. If a dataset has a “dev” flag, it signifies that we are actively working on it and do not recommend using it. It is production-ready if it has a “prod” flag. The “prod” label signifies that the dataset has undergone quality review and testing, however users should always double-check on their own because errors are still possible.
[6]:
# load a few models to illustrate the problem
query = dict(experiment_id=["ssp585"],
variable_id="tasmax",
grid_label="gn",
table_id='Amon',
member_id='r1i1p1f1'
)
cat = col.search(**query)
xarray_kwargs = {'consolidated': True, 'decode_times':False}
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
dset_dict = cat.to_dataset_dict(xarray_open_kwargs=xarray_kwargs)
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
[7]:
[key for key in dset_dict.keys()]
[7]:
['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Amon.gn',
'ScenarioMIP.NUIST.NESM3.ssp585.Amon.gn',
'ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.Amon.gn',
'ScenarioMIP.CCCma.CanESM5.ssp585.Amon.gn',
'ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.Amon.gn',
'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Amon.gn',
'ScenarioMIP.NCAR.CESM2-WACCM.ssp585.Amon.gn',
'ScenarioMIP.FIO-QLNM.FIO-ESM-2-0.ssp585.Amon.gn',
'ScenarioMIP.MIROC.MIROC6.ssp585.Amon.gn',
'ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.Amon.gn',
'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Amon.gn',
'ScenarioMIP.CAS.FGOALS-g3.ssp585.Amon.gn']
[8]:
ds = dset_dict[list(dset_dict.keys())[0]]
ds
[8]:
<xarray.Dataset>
Dimensions: (lat: 96, bnds: 2, lon: 192, member_id: 1,
dcpp_init_year: 1, time: 1032)
Coordinates:
height float64 ...
* lat (lat) float64 -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
lat_bnds (lat, bnds) float64 dask.array<chunksize=(96, 2), meta=np.ndarray>
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
* time (time) int64 0 708 1416 2148 ... 750924 751656 752388 753120
time_bnds (time, bnds) float64 dask.array<chunksize=(1032, 2), meta=np.ndarray>
* member_id (member_id) object 'r1i1p1f1'
* dcpp_init_year (dcpp_init_year) float64 nan
Dimensions without coordinates: bnds
Data variables:
tasmax (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 516, 96, 192), meta=np.ndarray>
Attributes: (12/63)
Conventions: CF-1.7 CMIP-6.2
activity_id: ScenarioMIP
branch_method: standard
branch_time_in_child: 60265.0
branch_time_in_parent: 60265.0
cmor_version: 3.5.0
... ...
intake_esm_attrs:variable_id: tasmax
intake_esm_attrs:grid_label: gn
intake_esm_attrs:zstore: gs://cmip6/CMIP6/ScenarioMIP/MPI-M/MPI-...
intake_esm_attrs:version: 20190710
intake_esm_attrs:_data_format_: zarr
intake_esm_dataset_key: ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585....[9]:
fig, axarr = plt.subplots(nrows=4, ncols=3, figsize=[30,20])
for ax,(k, ds) in zip(axarr.flat,dset_dict.items()):
if 'member_id' in ds.dims:
ds = ds.isel(member_id=0)
ds.coords['lon'] = (ds.coords['lon'] + 180) % 360 - 180
ds = ds.sortby(ds.lon)
da = ds.tasmax.isel(time=0).squeeze().plot.pcolormesh(ax=ax, cmap='coolwarm')
ax.set_title(k)
[10]:
fig, axarr = plt.subplots(nrows=4, ncols=3, figsize=[30,20])
for ax,(k, ds) in zip(axarr.flat,dset_dict.items()):
if 'member_id' in ds.dims:
ds = ds.isel(member_id=0)
da = ds.tasmax.sel(lon=280, lat=45, method='nearest').squeeze().plot(ax=ax, color='blue')
ax.set_title(k)
ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.
Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.
Property |
Values |
|---|---|
Temporal extent: |
01/01/1979 – 12/31/2020 |
Spatial extent: |
World : [-180, 180, -90, 90] |
Chunks (timeseries’s version): |
{‘time’: 14880, ‘longitude’: 15, ‘latitude’: 15} |
Chunks (spatial’s version): |
{‘time’: 24, ‘longitude’: 1440, ‘latitude’: 721} |
Spatial resolution: |
0.25 degrees |
Spatial reference: |
WGS84 (EPSG:4326) |
Temporal resolution: |
1 hour |
Update frequency: |
In 2023, we will update it weekly |
[11]:
[eid for eid in col.df['experiment_id'].unique() if 'ssp' in eid]
[11]:
['ssp585',
'ssp245',
'ssp370SST-lowCH4',
'ssp370-lowNTCF',
'ssp370SST-lowNTCF',
'ssp370SST-ssp126Lu',
'ssp370SST',
'ssp370pdSST',
'ssp119',
'ssp370',
'esm-ssp585-ssp126Lu',
'ssp126-ssp370Lu',
'ssp370-ssp126Lu',
'ssp126',
'esm-ssp585',
'ssp245-GHG',
'ssp245-nat',
'ssp460',
'ssp434',
'ssp534-over',
'ssp245-stratO3',
'ssp245-aer',
'ssp245-cov-modgreen',
'ssp245-cov-fossil',
'ssp245-cov-strgreen',
'ssp245-covid',
'ssp585-bgc']
[12]:
# there is currently a significant amount of data for these runs
expts = ['historical', 'ssp245', 'ssp585']
query = dict(
experiment_id=expts,
table_id='Amon',
variable_id=['tas'],
member_id = 'r1i1p1f1',
)
col_subset = col.search(require_all_on=["source_id"], **query)
col_subset.df.groupby("source_id")[
["experiment_id", "variable_id", "table_id"]
].nunique()
/usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/intake_esm/_search.py:80: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
for _, group in grouped:
[12]:
| experiment_id | variable_id | table_id | |
|---|---|---|---|
| source_id | |||
| ACCESS-CM2 | 3 | 1 | 1 |
| AWI-CM-1-1-MR | 3 | 1 | 1 |
| BCC-CSM2-MR | 3 | 1 | 1 |
| CAMS-CSM1-0 | 3 | 1 | 1 |
| CAS-ESM2-0 | 3 | 1 | 1 |
| CESM2-WACCM | 3 | 1 | 1 |
| CIESM | 3 | 1 | 1 |
| CMCC-CM2-SR5 | 3 | 1 | 1 |
| CMCC-ESM2 | 3 | 1 | 1 |
| CanESM5 | 3 | 1 | 1 |
| E3SM-1-1 | 3 | 1 | 1 |
| EC-Earth3 | 3 | 1 | 1 |
| EC-Earth3-CC | 3 | 1 | 1 |
| EC-Earth3-Veg | 3 | 1 | 1 |
| EC-Earth3-Veg-LR | 3 | 1 | 1 |
| FGOALS-f3-L | 3 | 1 | 1 |
| FGOALS-g3 | 3 | 1 | 1 |
| FIO-ESM-2-0 | 3 | 1 | 1 |
| GFDL-CM4 | 3 | 1 | 1 |
| GFDL-ESM4 | 3 | 1 | 1 |
| IITM-ESM | 3 | 1 | 1 |
| INM-CM4-8 | 3 | 1 | 1 |
| INM-CM5-0 | 3 | 1 | 1 |
| IPSL-CM6A-LR | 3 | 1 | 1 |
| KACE-1-0-G | 3 | 1 | 1 |
| KIOST-ESM | 3 | 1 | 1 |
| MIROC6 | 3 | 1 | 1 |
| MPI-ESM1-2-HR | 3 | 1 | 1 |
| MPI-ESM1-2-LR | 3 | 1 | 1 |
| MRI-ESM2-0 | 3 | 1 | 1 |
| NESM3 | 3 | 1 | 1 |
| NorESM2-LM | 3 | 1 | 1 |
| NorESM2-MM | 3 | 1 | 1 |
| TaiESM1 | 3 | 1 | 1 |
[13]:
def drop_all_bounds(ds):
drop_vars = [vname for vname in ds.coords
if (('_bounds') in vname ) or ('_bnds') in vname]
return ds.drop(drop_vars)
def open_dset(df):
assert len(df) == 1
ds = xr.open_zarr(fsspec.get_mapper(df.zstore.values[0]), consolidated=True)
return drop_all_bounds(ds)
def open_delayed(df):
return dask.delayed(open_dset)(df)
from collections import defaultdict
dsets = defaultdict(dict)
for group, df in col_subset.df.groupby(by=['source_id', 'experiment_id']):
dsets[group[0]][group[1]] = open_delayed(df)
[14]:
dsets_ = dask.compute(dict(dsets))[0]
[15]:
# calculate global means
def get_lat_name(ds):
for lat_name in ['lat', 'latitude']:
if lat_name in ds.coords:
return lat_name
raise RuntimeError("Couldn't find a latitude coordinate")
def global_mean(ds):
lat = ds[get_lat_name(ds)]
weight = np.cos(np.deg2rad(lat))
weight /= weight.mean()
other_dims = set(ds.dims) - {'time'}
return (ds * weight).mean(other_dims)
[16]:
expt_da = xr.DataArray(expts, dims='experiment_id', name='experiment_id',
coords={'experiment_id': expts})
dsets_aligned = {}
for k, v in tqdm(dsets_.items()):
expt_dsets = v.values()
if any([d is None for d in expt_dsets]):
print(f"Missing experiment for {k}")
continue
for ds in expt_dsets:
ds.coords['year'] = ds.time.dt.year
# workaround for
# https://github.com/pydata/xarray/issues/2237#issuecomment-620961663
dsets_ann_mean = [v[expt].pipe(global_mean)
.swap_dims({'time': 'year'})
.drop('time')
.coarsen(year=12).mean()
for expt in expts]
# align everything with the 4xCO2 experiment
dsets_aligned[k] = xr.concat(dsets_ann_mean, join='outer',
dim=expt_da)
100%|██████████| 34/34 [00:13<00:00, 2.46it/s]
[17]:
with progress.ProgressBar():
dsets_aligned_ = dask.compute(dsets_aligned)[0]
We can quickly choose data subsets in both space and time using xarray. Here, we choose July 19–20, 1996, a period when Quebec saw historically extreme precipitation (Canada). The graphic package hvplot can then be used to track the storm throughout the event.
[18]:
source_ids = list(dsets_aligned_.keys())
source_da = xr.DataArray(source_ids, dims='source_id', name='source_id',
coords={'source_id': source_ids})
big_ds = xr.concat([ds.reset_coords(drop=True)
for ds in dsets_aligned_.values()],
dim=source_da)
big_ds
[18]:
<xarray.Dataset>
Dimensions: (year: 451, experiment_id: 3, source_id: 34)
Coordinates:
* year (year) float64 1.85e+03 1.851e+03 ... 2.299e+03 2.3e+03
* experiment_id (experiment_id) <U10 'historical' 'ssp245' 'ssp585'
* source_id (source_id) <U16 'ACCESS-CM2' 'AWI-CM-1-1-MR' ... 'TaiESM1'
Data variables:
tas (source_id, experiment_id, year) float64 287.0 287.0 ... nan[ ]:
[19]:
df_all = big_ds.sel(year=slice(1900, 2100)).to_dataframe().reset_index()
df_all.head()
[19]:
| year | experiment_id | source_id | tas | |
|---|---|---|---|---|
| 0 | 1900.0 | historical | ACCESS-CM2 | 287.019917 |
| 1 | 1900.0 | historical | AWI-CM-1-1-MR | 286.958154 |
| 2 | 1900.0 | historical | BCC-CSM2-MR | 287.996260 |
| 3 | 1900.0 | historical | CAMS-CSM1-0 | 287.084974 |
| 4 | 1900.0 | historical | CAS-ESM2-0 | 287.263682 |
[20]:
sns.relplot(data=df_all,
x="year", y="tas", hue='experiment_id',
kind="line", errorbar='sd', aspect=2);
b) Cordex-NA#
[21]:
col = intake.open_esm_datastore('https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json')
col
aws-na-cordex catalog with 330 dataset(s) from 330 asset(s):
| unique | |
|---|---|
| variable | 15 |
| standard_name | 10 |
| long_name | 18 |
| units | 10 |
| spatial_domain | 1 |
| grid | 2 |
| spatial_resolution | 2 |
| scenario | 6 |
| start_time | 3 |
| end_time | 4 |
| frequency | 1 |
| vertical_levels | 1 |
| bias_correction | 3 |
| na-cordex-models | 9 |
| path | 330 |
| derived_variable | 0 |
[22]:
# Show the first few lines of the catalog
col.df
[22]:
| variable | standard_name | long_name | units | spatial_domain | grid | spatial_resolution | scenario | start_time | end_time | frequency | vertical_levels | bias_correction | na-cordex-models | path | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | hurs | relative_humidity | Near-Surface Relative Humidity | % | north_america | NAM-22i | 0.25 deg | eval | 1979-01-01T12:00:00 | 2014-12-31T12:00:00 | day | 1 | raw | ['ERA-Int.CRCM5-UQAM', 'ERA-Int.CRCM5-OUR', 'E... | s3://ncar-na-cordex/day/hurs.eval.day.NAM-22i.... |
| 1 | hurs | relative_humidity | Near-Surface Relative Humidity | % | north_america | NAM-44i | 0.50 deg | eval | 1979-01-01T12:00:00 | 2015-12-31T12:00:00 | day | 1 | raw | ['ERA-Int.CRCM5-UQAM', 'ERA-Int.RegCM4', 'ERA-... | s3://ncar-na-cordex/day/hurs.eval.day.NAM-44i.... |
| 2 | hurs | relative_humidity | Near-Surface Relative Humidity | % | north_america | NAM-22i | 0.25 deg | hist-rcp45 | 1949-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | mbcn-Daymet | ['CanESM2.CanRCM4'] | s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA... |
| 3 | hurs | relative_humidity | Near-Surface Relative Humidity | % | north_america | NAM-22i | 0.25 deg | hist-rcp45 | 1949-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | mbcn-gridMET | ['CanESM2.CanRCM4'] | s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA... |
| 4 | hurs | relative_humidity | Near-Surface Relative Humidity | % | north_america | NAM-22i | 0.25 deg | hist-rcp45 | 1949-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | raw | ['GFDL-ESM2M.CRCM5-OUR', 'CanESM2.CRCM5-OUR', ... | s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 325 | vas | northward_wind | Northward Near-Surface Wind | m s-1 | north_america | NAM-44i | 0.50 deg | rcp45 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | raw | ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM'... | s3://ncar-na-cordex/day/vas.rcp45.day.NAM-44i.... |
| 326 | vas | northward_wind | Northward Near-Surface Wind (Bias-Adjusted) | m s-1 | north_america | NAM-22i | 0.25 deg | rcp85 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | mbcn-gridMET | ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... | s3://ncar-na-cordex/day/vas.rcp85.day.NAM-22i.... |
| 327 | vas | northward_wind | Northward Near-Surface Wind | m s-1 | north_america | NAM-22i | 0.25 deg | rcp85 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | raw | ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... | s3://ncar-na-cordex/day/vas.rcp85.day.NAM-22i.... |
| 328 | vas | northward_wind | Northward Near-Surface Wind (Bias-Adjusted) | m s-1 | north_america | NAM-44i | 0.50 deg | rcp85 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | mbcn-gridMET | ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... | s3://ncar-na-cordex/day/vas.rcp85.day.NAM-44i.... |
| 329 | vas | northward_wind | Northward Near-Surface Wind | m s-1 | north_america | NAM-44i | 0.50 deg | rcp85 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | raw | ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... | s3://ncar-na-cordex/day/vas.rcp85.day.NAM-44i.... |
330 rows × 15 columns
[23]:
data_var = 'tmax'
col_subset = col.search(
variable=data_var,
grid="NAM-44i",
bias_correction="raw",
scenario='rcp45'
)
col_subset
aws-na-cordex catalog with 1 dataset(s) from 1 asset(s):
| unique | |
|---|---|
| variable | 1 |
| standard_name | 1 |
| long_name | 1 |
| units | 1 |
| spatial_domain | 1 |
| grid | 1 |
| spatial_resolution | 1 |
| scenario | 1 |
| start_time | 1 |
| end_time | 1 |
| frequency | 1 |
| vertical_levels | 1 |
| bias_correction | 1 |
| na-cordex-models | 1 |
| path | 1 |
| derived_variable | 0 |
[24]:
col_subset.df
[24]:
| variable | standard_name | long_name | units | spatial_domain | grid | spatial_resolution | scenario | start_time | end_time | frequency | vertical_levels | bias_correction | na-cordex-models | path | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tmax | air_temperature | Daily Maximum Near-Surface Air Temperature | degC | north_america | NAM-44i | 0.50 deg | rcp45 | 2006-01-01T12:00:00 | 2100-12-31T12:00:00 | day | 1 | raw | ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM'... | s3://ncar-na-cordex/day/tmax.rcp45.day.NAM-44i... |
[25]:
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = col_subset.to_dataset_dict(
xarray_open_kwargs={"consolidated": True}, storage_options={"anon": True}
)
print(f"\nDataset dictionary keys:\n {dsets.keys()}")
# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
store_name = dataset_key + ".zarr"
ds = dsets[dataset_key]
ds
# Note that the summary includes a 'member_id' coordinate, which is a renaming of the
# 'na-cordex-models' column in the catalog.
--> The keys in the returned dictionary of datasets are constructed as follows:
'variable.frequency.scenario.grid.bias_correction'
Dataset dictionary keys:
dict_keys(['tmax.day.rcp45.NAM-44i.raw'])
[25]:
<xarray.Dataset>
Dimensions: (lat: 129, lon: 300, member_id: 6, time: 34698, bnds: 2)
Coordinates:
* lat (lat) float64 12.25 12.75 13.25 13.75 ... 74.75 75.25 75.75 76.25
* lon (lon) float64 -171.8 -171.2 -170.8 ... -23.25 -22.75 -22.25
* member_id (member_id) <U21 'MPI-ESM-LR.CRCM5-UQAM' ... 'CanESM2.CanRCM4'
* time (time) datetime64[ns] 2006-01-01T12:00:00 ... 2100-12-31T12:00:00
time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(17349, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
tmax (member_id, time, lat, lon) float32 dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
Attributes: (12/41)
CORDEX_domain: NAM-44
contact: {"MPI-ESM-LR.CRCM5-UQAM": "Winger.K...
creation_date: {"MPI-ESM-LR.CRCM5-UQAM": "2012-09-...
driving_experiment: {"MPI-ESM-LR.CRCM5-UQAM": "MPI-M-MP...
driving_experiment_name: rcp45
driving_model_ensemble_member: {"MPI-ESM-LR.CRCM5-UQAM": "r1i1p1",...
... ...
intake_esm_attrs:vertical_levels: 1
intake_esm_attrs:bias_correction: raw
intake_esm_attrs:na-cordex-models: ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2....
intake_esm_attrs:path: s3://ncar-na-cordex/day/tmax.rcp45....
intake_esm_attrs:_data_format_: zarr
intake_esm_dataset_key: tmax.day.rcp45.NAM-44i.raw[26]:
ds.tmax \
.sel(lat=45, lon=-75, method='nearest') \
.hvplot(x='time',by='member_id', width=750, height=500, grid=True) \
.opts(legend_position='bottom')
[26]: